The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB Integration
نویسندگان
چکیده
We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix “seman” that occur in the context of the word “ontology”, as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with “dbworld” in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both “SIGMOD” and “SIGIR”; and (v) arbitrary combinations of these. The prefix search and completion mechanism of CompleteSearch is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.
منابع مشابه
Integration of exhaust manifold with engine cylinder head towards size and weight reduction
In this research, a new exhaust manifold and its cooling jackets is first designed for the integrated exhaust manifold into cylinder head (IEMCH) for a turbocharged engine. Then, the gas exchange and flow analysis is carried out numerically to evaluate the proper conditions for the exhaust gas and the coolant stream respectively. Finally, the entire engine parts are thermally analyzed to assure...
متن کاملA Confluence of Column Stores and Search Engines - Opportunities and Challenges
IR and DB integration has been a long-withstanding research challenge. Most of the work trying to integrate the two fields is motivated by specific application scenarios. In this paper we approach this problem from another perspective. Instead of focusing on IR and DB as whole fields, we restrict the focus to search engines and column stores. We present observations of similarities in the two t...
متن کاملA New DBMS Architecture for DB-IR Integration
Nowadays, as there is an increasing need to integrate the DBMS (for structured data) with Information Retrieval (IR) features (for unstructured data), DB-IR integration becomes one of major challenges in the database area[1,2]. Extensible architectures provided by commercial ORDBMS vendors can be used for DB-IR integration. Here, extensions are implemented using a high-level (typically, SQL-lev...
متن کاملEfficient index structures for and applications of the CompleteSearch engine
Traditional search engines, such as Google, offer response times well under one second, even for a corpus with more than a billion documents. They achieve this by making use of a (parallelized) inverted index. However, the inverted index is primarily designed to efficiently process simple key word queries, which is why search engines rarely offer support for queries which cannot be (re-)formula...
متن کاملTopX - Efficient and Versatile Top-k Query Process-ing for Text, Semistructured, and Structured Data
This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and structured data. Residing at the very synapse of database (DB) engineering and information retrieval (IR), it integrates efficient scheduling algorithms for top-k-style ranked retrieval with powerful scoring model...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007